Approximation Algorithms for Bregman Co-clustering and 

Tensor Clustering* 

Stefanie Jegelka Suvrit Sra 

MPI for Biological Cybernetics MPI for Biological Cybernetics 
72070 Tubingen, Germany 72070 Tiibingen, Germany 

Arindam Banerjee 
Univ. of Minnesota 
MN 55455, USA 



Abstract 

In the past few years powerful generalizations to the Euclidean k-means problem have been 
made, such as Bregman clustering 0] , co-clustering (i.e., simultaneous clustering of rows and 
columns of an input matrix) @, [3], an d tensor clustering [3, HH|. Like k-means, these more 
general problems also suffer from the NP-hardness of the associated optimization. Researchers 
have developed approximation algorithms of varying degrees of sophistication for k-means, k- 
medians, and more recently also for Bregman clustering @). However, there seem to be no 
approximation algorithms for Bregman co- and tensor clustering. In this paper we derive the 
first (to our knowledge) guaranteed methods for these increasingly important clustering settings. 
Going beyond Bregman divergences, we also prove an approximation factor for tensor clustering 
with arbitrary separable metrics. Through extensive experiments we evaluate the characteristics 
of our method, and show that it also has practical impact. 



1 Introduction 

Partitioning data points into clusters is a fundamentally hard problem. The well-known Euclidean k- 
means problem that partitions the input data points (vectors in M. d ) into K clusters while minimi zing 
sums of their squared distances to corresponding cluster centroids, is an NP hard problem [19( 
(exponential in d) . However, simple and frequently used procedures that rapidly obtain local minima 
exist since a long time [2^, [28| . 

Because of its wide applicability and importance, the Euclidean k-means problem has been 
generalized in several directions. Specific examples relevant to this paper include: 

• Bregman clustering Q , where instead of minimizing squared Euclidean distances one minimizes 
Bregman divergences (which are generalized distance functions, see f|3 . 10[) or for details), 

• Bregman co- clustering 0] (which includes both Euclidean [l6| and information-theoretic co- 
clustering [lit as special cases), where the set of input vectors is viewed as a matrix and one 
simultaneously clusters rows and columns to obtain coherent submatriccs (co-clusters), while 
minimizing a Bregman divergence, and 



Tensor clustering or multiway clustering 3J|, especially the version based on Bregman diver- 



gences |8(, where one simultaneously clusters along various dimensions of the input tensor. 

For these problems too, the commonly used heuristics perform well, but do not provide theoretical 
guarantees (or at best assure local optimality). For k-means type clustering problems — i.e., problems 



*A part of the theory of this paper appeared in MPI-TR-#:177 [35 
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that group together input vectors into clusters while minimizing "distance" to cluster centroids — 
there exist several algorithms that approximate a globally optimal solution. We refer the reader 
to P, 0, an d the numerous references therein for more details. 

In stark contrast, approximation algorithms for tensor clustering arc much less studied. We 
are aware of only two very recent attempts (both papers arc from 2008) for the two-dimensional 



special case of co-clustering, namely, [4[ and [3l| — and both of the papers follow similar approaches to 



obtain their approximation guarantees. Both prove a 2ai-approximation for Euclidean co-clustcring, 



Puolamaki et al. 3l| an additional factor of (1 + \/2) for binary matrices and an l\ norm objective, 
and Anagnostopoulos et al. [H a factor of 3ai for co-clustering real matrices with l v norms. In all 
factors a\ is an approximation guarantee for clustering either rows or columns. In this paper, we 
build upon Q and obtain approximation algorithms for tensor clustering with Bregman divergences 
and arbitrary separable metrics such as £ p -norms. The latter result is of particular interest for l\- 
norm based tensor clustering, which may be viewed as a generalization of k-medians to tensors. In 
the terminology of 0, we focus on the "block average" versions of co- and tensor clustering. 

Additional discussion and relevant references for co-clustering can be found in ||, while for the 
lesser known problem of tensor clustering more background can be gained by referring to 0, Hoi . 

HEillli. 



1.1 Contributions 

The main contribution of this paper is the analysis of an approximation algorithm for tensor cluster- 
ing that achieves an approximation ratio of O(ma), where m is the order of the tensor and a is the 
approximation factor of a corresponding ID clustering algorithm. Our results apply to a fairly broad 



class of objective functions, including metrics such as l v norms or Hilbcrtian metrics [241. 133j . and 
divergence functions such as Bregman divergences [IH (with some assumptions). As corollaries, our 
results solve two open problems posed by viz., whether their methods for Euclidean co-clustering 
could be extended to Bregman co-clustering, and if one could extend the approximation guarantees 
to tensor clustering. Owing to the structure of the algorithm, our results also give insight into 
proprties of the tensor clustering problem as such, namely, a bound on the amount of information 
inherent in the joint consideration of several dimensions. 

In addition, we provide extensive experimental validation of the theoretical claims, which forms 
an additional contribution of this paper. 



2 Background 

Traditionally, "center" based clustering algorithms seek partitions of columns of an input matrix 
X = [xi, . . . ,x n ] into clusters C = {C\ 1 . . . ,Ck}, and find "centers" fik that minimize the objective 

where the function d(x,y) measures cluster quality. The "center" u.^ of cluster Ck is given by the 
mean of the points in Ck when d(x,y) is a Bregman divergence [7[. Co-clustering extends (|2.ip 
to seek simultaneous partitions (and centers Hu) of rows and columns of X, so that the objective 
function 

is minimized; fiu denotes the (scalar) "center" of the cluster described by the row and column index 
sets, viz., / and J. Formulation (|2.2[) is easily generalized to tensors, as shown in Section |2~21 below. 
However, we first recall basic notation about tensors before formally presenting the tensor clustering 
problem. Tensors are well-studied in multilinear algebr a 12211 . and they are gaining importance in 
both data mining and machine learning applications ptl l3d |. 
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2.1 Tensors 

A large part of the material in this section is derived from the well-written paper of de Silva and 
Lim [17| — their notation turns out to be particularly suitable for our analysis. An order-m tensor 
A may be viewed as an element of the vector space R" lX ••• Xn "> (in this paper we denote matrices 
and tensors using sans-serif letters) . An individual component of the tensor A is represented by the 
multiply-indexed value a,i 1 i 2 ...i m , where ij £ {1, . . . ,rij} for 1 < j < m. 



Multilinear matrix multiplication 

For us the most important operation on tensors is that of multilinear matrix multiplication, which 
is a generalization of the familiar concept of matrix multiplication. Matrices act on other matrices 
by cither left or right multiplication. For an order-3 tensor, there are three dimensions on which a 
matrix may act via matrix multiplication. For example, for an order-3 tensor A £ R™i xn 2xn 3 ^ an( j 
three matrices P £ W lXni , Q £ R P2X ™ 2 , and R £ R P3X ™ 3 , multilinear matrix multiplication is the 
operation defined by the action of these three matrices on the different dimensions of A that yields 
the tensor A' <E RPi x P2xj>3^ p orma iiy ; the entries of the tensor A' are given by 

^ — yTLi ,ri2 ,ri3 

a 'imn = 2^ ijk=1 PuQmjr n kaij k , (2.3) 

and this operation is written compactly as 

A' = (P,Q,R) • A. (2.4) 

Multilinear multiplication extends naturally to tensors of arbitrary order. If A £ M. niXn2X "' xnm , 
and Pi G W lXni , . . . , P m £ MP- x,i ™, then A' = ((P x , . . . , P m ) ■ A) £ Rpi*- x p»« has components 



a U...* m = y .Z , =i P Si ' ' -ptL a n-0m, (2-5) 



(k) 

where denotes the ij-th entry of matrix P^. 

Example 2.1 (Matrix Multiplication). Let A e M niX " 2 , P e W xni , and Q e E. qxn2 be three 
matrices. The matrix product PAQ T can be written as the multilinear multiplication (P,Q) • A. 

Proposition 2.2 (Basic Properties). The following properties of multilinear multiplication are easily 
verified (and generalized to tensors of arbitrary order): 

1. Linearity: Let a, [3 £ R, and A and B be tensors with same dimensions, then 

(P, Q) • (aA + (3B) = a (P, Q) • A + p (P, Q) • B 

2. Product rule: For matrices P\, P2, Qi, Q2 of appropriate dimensions, and a tensor IK 

(Pi, P 2 ) • ((Qi, Q2) • A) = (PxQx, P 2 Q 2 ) • A 

3. Multilinearity: Let a, [3 £ R, and P, Q, and R be matrices of appropriate dimensions. Then, 
for a tensor A the following holds 

(P, aQ + 0R) ■ A = a (P, Q) • A + (3 (P, R) • A 

Vector Norms 

The standard vector £ p -norms can be easily extended to tensors, and are defined as 

n A iip= (E n ,.... 4m i^-^i p ) 1/P ' ( 2 - 6 ) 

for p > 1. In particular for p = 2 we get the "Frobenius" norm, also written as || A||p. 
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Inner Product 

The Frobenius norm induces an inner-product that can be defined as 

(A,B) = V a ll ..., m b n ... lm , (2.7) 

* — *&i,...,i m 

so that ||A||| = (A, A) holds as usual. 

Proposition 2.3. The following property of this inner product is easily verified (a generalization of 
the familiar property (Ax, By) = ^x, A T By) for vectors): 

((P 1; . . . , P m ) • A, (Qi, . . . , Q m ) • B) = (A, (P^Q!, . . . , P^Q m ) • B> . (2.8) 

Proof: Using definition (|2.5|) and the inner-product rule (|2.7[) we have 

<(Pi,...,P m )-A,(Q 1 ,... > Q ra ).B>= E pSi^i-"Pw»«£:L «"J»^---*-.. 

= E (E^CJ-fE^ltL)"— 

= E ( P i"Qi)ji*i ' ■ ' ( P mQm) Jm fc m aj 1 ... Jm 6 fcl ...fc m = ^ a ii---imb'j l ...j m = (A, B') , 

jit— tj'm 31— jm 

fcl , . . . ,fc m 

where B' = (P^Qi, • • • , P„Q m ) • B. 



Divergence 

Finally, we define an arbitrary divergence function d(X, Y) between two m-dimcnsional tensors X, Y 
as an elementwise sum of individual divergences, i.e., 

d(X,Y)=V . d(x ilt ... iim ,y ilt ... tim ), (2.9) 

* ^2l,...,2 m 

and we will define the scalar divergence d(x, y) as the need arises. 



2.2 Tensor clustering 

Let A £ K"ix---xn m k e an orc j er _ m tensor that we wish to partition into coherent sub-tensors (or 
clusters). A basic approach is to minimize the sum of the divergences between individual (scalar) 
elements in each cluster to their corresponding (scalar) cluster "centers" . Readers familiar with [9j 
will recognize this to be a "block-average" variant of tensor clustering. 

Assume that each dimension j (1 < j < m) is partitioned into kj clusters. Let C, £ {0, l} n j xfc j 
be the cluster indicator matrix for dimension j, where the ifc-th entry of such a matrix is one if and 
only if index i belongs to the A:-th cluster (1 < k < kj) for dimension j. Then, the tensor clustering 
problem is ( cf. \2.2\i : 

minimize d(A, (Ci, . . . , C m ) ■ M), s.t. Q G {0, xk] , (2.10) 

Ci ,. . .,C m ,M 

where the tensor M collects all the cluster "centers." 

3 Algorithm and Analysis 

Given formulation (|2.10[) . our algorithm, which we name Combination Tensor Clustering (CoTcC), 
follows the simple outline: 

1. Cluster along each dimension j, using an approximation algorithm to obtain 
clustering Let C = (Ci, . . . , C m ) 

2. Compute M = argmin XgR fe lX ■ xfc m d(A, C ■ X). 

3. Return the tensor clustering (Ci, . . . , C m ) (with representatives M). 
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Instead of clustering one dimension at a time, we can also cluster along t dimensions at a time, 
which we will call t- dimensional clustering. For an order-m tensor, with t = 1 we form groups of 
order-(m — 1) tensors. For illustration, consider an order-3 tensor A for which we group matrices 
when t = 1. For the first dimension we cluster the objects A(i, :, :) (using Matlab notation) to 
obtain cluster indicators Ci; we repeat the procedure for the second and third dimensions. The 
approximate tensor clustering will be the combination (Ci,C2,C3). As we assumed the Brcgman 
divergences to be separable, the sub-tensors, e.g., A(i, :, :) can be simply treated as vectors. 

Apart from 0, [3l[, all approximation guarantees refer to one-dimensional clustering algorithms. 
Any one-dimensional approximation algorithm can be used as a base method for our scheme outlined 
above. For example, the method of Ackermann and Blomer or the more practical Bregman clus- 
tering approaches of (30l . 35 ]] are two potential choices, though with different approximation factors. 
Clustering along individual dimensions and then combining the results to obtain a tensor cluster- 
ing might seem counterintuitive to the idea of "co" -clustering, where one simultaneously clusters 
along different dimensions.. However, our analysis will show that dimension- wise clustering suffices 
to obtain strong approximation guarantees for tensor clustering — a fact often observed empirically 
too. At the same time, our Thmcorcm 1 bounds the amount of information that can lie in the 
simultaneous consideration of multiple dimensions. 



3.1 Results 

The main contribution of this paper is the following approximation guarantee for CoTeC, which we 
prove in the remainder of this section. 

Theorem 3.1 (Approximation). Let A be an order-m tensor and let Cj denote its clustering along 
the jth subset oft dimensions (1 < j < m/t), as obtained from a multiway clustering algorithm with 
guarantee aJl. Let C = (Ci, . . . ,C m u) denote the induced tensor clustering, and Jopri 771 ) the best 
m- dimensional clustering. Then, 

J{C) < p{m/t)pdOitJoPT{rri), with (3-1) 

1. p d = l and p(m/t) = 2 lo ^ m / t if d(x,y) = (x - yf , 

2. pd = 1 and p(m/t) — 2m/t if d(x,y) is a metri^. 

Theorem l3.1l is quite general, and it can be combined with some natural assumptions (see Section 
13. 5[) to yield results for tensor clustering with more general divergence functions too (here pd might 
be greater than 1). 



3.2 Analysis: Theorem 13. 1L Euclidean case 

We begin our proof with the Euclidean case, i.e., d{x,y) = (x — y) 2 . Our proof is inspired by the 
techniques of [J]. We establish that given a clustering algorithm which clusters along t of the m 
dimensions at a tim^] with an approximation factor of at , our CoTeC algorithm achieves an objective 
within a factor 0(\m/t]at) of the optimal. For example, for t = 1 we can use the seeding methods 
of [13, H|| or the stronger approximation algorithms of [l[ . We assume without loss of generality 
(wlog) that m = 2 h t for an integer h (otherwise, pad in empty dimensions). 

Since for the squared Frobenius norm, each cluster "center" is given by the mean, we can recast 
Problem ([2. 101) into a more convenient form. To that end, note that the individual entries of the 

1 Both [3d . l35l | discovered essentially the same method for Bregman clustering, though the analysis of [3d] is 
somewhat sharper. 

2 We say an approximation algorithm has guarantee a if it yields a solution that achieves an objective value within 
a factor 0(a) of the optimum. 

3 The results can be trivially extended to A-relaxed metrics that satisfy d(x,y) < \(d{x, z) + d(z,y))\ the corre- 
sponding approximation factor just gets scaled by A. 

4 One could also consider clustering differently sized subsets of the dimensions, say {t\ , ... ,t r }, where t\ + ■ ■ ■ + t r = 
m. However, this requires unilluminating notational jugglery, which we can skip for simplicity of exposition. 
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means tensor M are given by (c/. 



^...i m = iji.r.i/j e ( 3 - 2 ) 



iie/i....,i m e/„ 



with index sets Ij for 1 < j < m. Let Cj be the normalized cluster indicator matrix obtained 

by normalizing the columns of C,-, so that cjc, = 1^ . Then, we can rewrite (|2.10[) in terms of 
projection matrices Pj as: 

minimize J(C) = || A - (Pi, P m ) • A|||, s.t. P = CcJ. (3.3) 

C=(Ci,...,C m ) 

Lemma 3.2 (Pythagorean). Let P = (P x , . . . , P t ), 5 = (P t+ i, . . . , P m ), and P 1 - = (I Pi, . . . , l-P t ) 
be collections of projection matrices P j . Then, 

|| (P, S) ■ A + (P ± ,R) ■ B|| 2 = ||(P, S) ■ Af + ||(P\ R) ■ B|| 2 , 

where R is a collection of m — t projection matrices. 

Proof. Using ||A||| = (A, A) we can rewrite the l.h.s. as 

\\(P,S)-A+(P ± ,R)-B\\ 2 

= || (P, S) • A|| 2 + || {P ± ,R) ■ B|| 2 + 2((P, S) ■ A, (P-L, R) ■ B). 

The last term is immediately seen to be zero using Property (|2.8j) and the fact that PjP/ = 
P J (l-P,) = 0. □ 

Some more notation: Since we cluster along t dimensions at a time, we recursively partition 
the initial set of all m dimensions until (after log(m/t) + 1 steps), the sets of dimensions have length 
t. Let I denote the level of recursion, starting at I = log(m/t) = h and going down to I = 0. At level 
I, the sets of dimensions will have length 2 l t (so that for I = we have t dimensions). We represent 
each clustering along a subset of 2 l t dimensions by its corresponding 2 l t projection matrices. We 
gather these projection matrices into the collection P\ (note boldface), where the index i ranges 
from 1 to 2 h ~ l . 

Example 3.3. Consider an order-8 tensor where we group t = 2 dimensions at a time. Then, 
h = \og(m/t) = 2 and we have 3 levels. We recursively divide the set of dimensions in the middle, 
i.e., {1, . . . , 8} into {1, . . . , 4} and {5, . . . , 8} and so on, ending with {{1, 2}, {3, 4}, {5, 6}, {7, 8}}. The 
projection matrix for dimension i is Pj, and the full tensor clustering is represented by (Pi, . . . , Pg). 
For each level I = 0, 1, 2, individual collections of projection matrices P\ are 

P 2 = (Pi, P a , P 3 , P 4 , Pb, Pe, Pt, Pb) 

P\ = (P l5 P a , P 3 , P 4 ), P\ = (P 5 , P 6l P 7 , Pa) 

P 1 ° = (P 1 ,P 2 ), P 4 ° = (P 7 , P 8 ). 

We also need some notation to represent a complete tensor clustering along all m dimensions, 
where only a subset of 2 l t dimensions are clustered. We pad the collection P\ with m — 2 l t iden- 
tity matrices for the non-clustered dimensions, and call this padded collection Q\. With recursive 
partitioning of the dimensions, Q\ subsumes Q° for 2 l (i — 1) < j < 2 l i, so that 



At level 0, the algorithm yields the collections Q° and PP. The remaining clusterings are simply 
combinations, i.e., products of these level-0 clusterings. We denote the collection of m — 2 l t identity 
matrices (of appropriate size) by P, so that Q[ = (P\,I l ). Accoutcred with our notation, we now 
prove the main lemma that relates the combined clustering to its sub-clusterings. 
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Lemma 3.4. Let A be an order-m tensor and m > 2 t. The objective function for any 2 t- 
dimensional clustering Pj = (P^'U-i^+v ■ ■ • > can ^ e bound via the sub -clusterings along only 
one set of dimensions of size t as 

||A - Q\ ■ A||| < max 2<||A - Q° • A|||. {3A) 

2 l (i-l)<j<2'i J v 1 

We can always (wlog) permute dimensions so that any set of 2 l clustered dimensions maps to 
the first 2 l ones. Hence, it suffices to prove the lemma for i = 1, i.e., the first 2 l dimensions. 

Proof. We prove the lemma for i = 1 by induction on I. 

Base: Let I = 0. Then Q\ = Q?, and ((23) holds trivially. 

Induction: Assume the claim holds for I > 0. Consider a clustering P[ +1 = {P[,Pi), or equiva- 
lently Qj_ +1 = Q[Q l 2 - Using P + P x = 7, we decompose A as 

A =(P 1 i+1 +P^ +1± ,/ i+1 )-A = {Pi +Pi ± ,Pa +P2 ± ,I l+1 ) ■ A 

= (Pi , Pa , ) ■ A + {Pi X , P\ , 7 !+1 ) ■ A + (P[ , Pi ± , ) ■ A + {P\ ^ , Pi ± , ) ■ A 
= QiQa ■ A + Q^Qi ■ A + Q'.Q 1 ^ ■ A + Q^Q^ ■ A, 

where = (P^P*). Since <5i +:L = QiQ 2 , the Pythagorean Property [3T2l yields 
l|A - ■ Af = HQ^Qa ■ Af + ||QiQ^ ■ A|| 2 + HQ^Q^ ■ A|| 2 . 
Combining the above equalities with the assumption (wlog) 

||Qi X Q2-A|| 2 >||QiQ^.AH 2 , 

we obtain the inequalities 

l|A - Q[Q l 2 ■ A|| 2 < 2(||Q' i ± Q^A|| 2 + ||Q i 1 ± Q^-A|| 2 ) 
= 2||Qi ± Q^A + Qi ± Q^-A|| 2 = 2\\Q 1 ^ {Q l 2 + Q^) ■ A|| 2 
= 2\\Q[ ± -A|| 2 = 2||A-Qi-A|| 2 

<2 max IA-Q--AII 2 < 2-2' max IIA — Q" • All 2 , 

1<4<2< 1<3'<2'+! 

where the last step follows from the induction hypothesis (|3.4p . and the two norm terms in the first 
line are combined using the Pythagorean Property. □ 

Proof. (Thm. \3.1[ Case 1). Let m = 2 h t. Using an algorithm with guarantee at, we cluster each 
subset (indexed by i) of t dimensions to obtain Q°. Let Si be the optimal sub-clustering of subset 
i, i.e., the result that would be if a t were 1. We bound the objective for the collection of all m 
sub-clusterings P/ 1 = Q\ as 

II A - Qi ■ A||| < 2 h max || A - Q9 ■ A|| 2 < 2 h a 4 max || A - S, ■ A\\l. (3.5) 

3 J 

The first inequality follows from Lemma 13.41 while the last inequality follows from the at approxi- 
mation factor that we used to get sub-clustering Q®. 

So far we have related our approximation to an optimal sub-clustering across a set of dimensions. 
Let us hence look at the relation between such an optimal sub-clustering S of the first t dimensions 
(via permutation, these dimensions correspond to an arbitrary subset of size t), and the optimal 
tensor clustering F across all the m = 2 h t dimensions. Recall that a clustering can be expressed by 
cither the projection matrices collected in Q[, or by cluster indicator matrices Q together with the 
mean tensor M, so that 

(C 1) ...,C 2lt) J i )-M = Qi-A. 
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Let and Cj be the dimension-wise cluster indicator matrices for S and F, respectively; By 
definition, S solves 

min ||A-(C 1 ,...,Q,J°)-M|||, s.t. Cj € {0,l} njXfc * , 

Ci,...,Ct,M 

which makes S even better than the sub-clustering (Cf , . . . , Cf ) induced by the optimal m-dimensional 
clustering F. Thus, 

||A - S ■ A||| < mm ||A - (Cf , . . . , Cf , 1°) ■ M ||| 

<||A-(Cf,...,Cf,J°)(l,...,l,Cf +1 ,...,C I ).M F ||| 

= \\A-F- A|||, (3.6) 

where M F is the tensor of means for the optimal m-dimensional clustering. Combining (|3.5[) 
with (|3.6p yields the final bound for the combined clustering C = Qi, 

J m (C) = || A - Q\ ■ A||! < 2 h a t ||A — F ■ A||| = 2 h a t J OPT (m), 

which completes the proof of the theorem. □ 

3.3 Analysis: Theorem 13. 1L Metric case 

Now we present our proof of Thm. 13.11 for the case where d(x, y) is a metric, such as an £ p distance 
or separable Hilbertian metric. For this case, recall that the tensor clustering problem is 

minimize J(C) = d(A, (Ci, . . . , C m ) • M), s.t. C € {0, l}"^ . (3.7) 

(Ci,...,C m ),M 

Since in general the best representative M is not the mean tensor, we cannot use the shorthand P A 
for M , so the proof is different from the Euclidean case. 

Proof. We will split the dimensions in a different way. Let R\ be the collection of clusterings of 
dimensions i + £— 1. R\ combines the Cj in a manner analogous to how Q\ combines projection 

matrices. For simplicity, the proof here is for clustering single dimensions at a time, but it generalizes 
in a straightforward way to chunks of t dimensions, leading to a factor 2m/t instead of 2m. 

Let us first prove a relation for any subset of the last m — i + 1 dimensions, R\ -R™^ 1 = i?™ - ^ 1 . 
Let = argmin x d(A, R\ ■ X) be the optimal representatives for the clustering collections R\ and 



»+i ' 



a,rgmind(R}Ml,RlR™- 1 i -X), X G w kl *~ Xn, - l * kt *~* k « 



x 

The index 1 will run over dimension i, and the multi- indices, r, j over dimensions 1, . . . ,i — 1 
and i + 1, . . . , to, respectively. The indices / and multi- indices J refer to the clusterings in R\ and 
R™^ 1 , respectively. Since is the element- wise minimum, we have 

d(Rl ■ M\,R\R? + 7 ■ Mi) = E E n ™„ E d((u.l) Ijr ,HUr) 

I,J i.£l,r j£J 

<EE J2 d ((^^,^T + 7\jr) = d{R\-u\,RT + 7 -K+i)- 

I, J L£l,rj£J 

We use this relation and the triangle inequality to break down _R™ into its single-dimansional parts. 
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We then relate the objectives of these parts to the optimal single-dimensional clusterings Sf. 
min d(A, Rim 1 - 1 • M m ) < d(A, R\R!?- 1 • Mi) 

M m 

< d(A, R\ ■ M\) + d(R\ ■ M\, RlR™' 1 • Mi) 

< d(A, R[ ■ M\) + d(R{ ■ M\, R^ 1 ■ M^ 1 ) 

< 2d(A, R\ ■ M\) + d(A, R^ 1 ■ M™" 1 ) 

< 2d(A, R\ ■ M\) + 2d(A, R\ ■ M\) + d(A, R'^ 1 ■ M™ _1 ) (3.8) 

< ■ ■ • 

m tci 

<2^2d{A,R} -M]) < 2^aiminrf(A,S' l 1 ■ X 1 ). (3.9) 

i=i i=i 

For (|3.8p . we applied the same steps as before to R\ and J?™ -2 , and then continued this breakdown, 
always splitting off the first dimension. The last relation follows from the ID approximation algo- 
rithm that was used. What is left is to bound (|3.9[) by the objective for the optimal m-dimensional 
clustering F ■ Mp = F1F2 . . . F m ■ Mp. Note that, since non-clustered dimensions have identity 
matrices, the cluster parts commute: FiFjX = FjFiX. Owing to the optimality of Si, we have 

min d(A, S 1 -X 1 ) < min d( A, F t ■ Y 1 ) < min d(A, F? (F? . . . F^F^ . . . F^ ■ Y m )) = d(A, F ■ M F ) 
for any term in the sum (|3.9p . Thus, it follows that 

m 

min d(A,R? ■ M m ) < 2^ ai mind(A, S 1 ■ X 1 ) < 2maid(A, F ■ M F ), 

i=l 

which completes the proof. □ 



3.4 Theorem 13.11 with Bregman divergences 



Theorem 13 . 1 1 also applies to Bregman divergences, i.e., divergences that can be bounded in terms of 
squared Euclidean distances and for which the best representative is the tensor of means defined in 
Equation 0. 

The Bregman divergence Bf{x,y) between scalars x and y is defined as 13, ll| 

B f (x, y) = f(x) - f(y) - f(y)(x - y), (3.10) 

for a given strictly convex function /. With / = \x 2 the divergence (|3.10|) reduces to the familiar 
Euclidean distance \{x — y) 2 , while for f(x) = xlogx it turns into the (generalized) KL Divergence. 
For tensors, we extend Definition (|3.10p by considering separable Bregman divergences, so that 

- B /( x > Y )=y]. . B f ( Xll ... lm ,y tl ... im ). 

Let g\j and ox be upper and lower bounds, respectively, with of, > 0, such that 

<r L B f (x, y) <\\x- y\\ 2 < a v B s {x, y) (3.11) 

for all x, y in the convex hull of the entries of the given tensor A. For KL-divergence, the data must 
then be bounded away from zero. 

Since the means tensor is the best representative argmin x Bf (A, Jf? • A) for a clustering R, we 
again use use projection matrices to express clusterings. Let Q\ be, as above, the full combination 
of projection matrices from dimension- wise clustering, and F = argminQ £>/(A, Q ■ A) the optimal 
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?7i-dimcnsional tensor clustering. Then we know that 

B/(A,Ctf)< < r £ H|A,Q£|| a 

< a c/ 2 lo ^ m/ *max||A- Q° • A|| 2 (3.12) 

3 

< ^2 log ^ m/t maxD(A, Q° ■ A) 

OL 3 

< ^2 log ^ m/t B f (A, F • A), (3.13) 

so pd — . Inequality (|3.12|) follows from Lemma l3~4l and Inequality (|3.13j) from an argumentation 
analogous to Equation (|3.6p . 

Curvature bounds as in (|3.11[) seem to be necessary for Bregman divergences to guarantee con- 
stant approximation factors for the underlying ID clustering — this intuition is reinforced by the 
results of [3], who avoided such curvature assumptions and had to be content with a non- constant 
O(logn) approximation factor for information theoretic clustering. 

3.5 Implications 

To obtain concrete bounds for a variety of tensor clustering problems, we can use Theorem 13.11 for 
t = 1 or t = 2 with existing ID approximation factors at from the literature. Table [1] summarizes 
the results. 

3.5.1 ID factors for Metric and Bregman clustering 

The (1 + e) approximation factor for ID clustering by Ackermann et al. applies to all metrics. 
It leads to an m-dimensional approximation factor of a m = p(m/t)(l + e). Arthur and Vassilvitskii 
Q prove a guarantee in expectation of ct\ = 8(log-?T + 2) for K clusters with Euclidean k-means, 
resulting in an expected a m = 8p(m/t)(logK + 2). 

For Bregman clustering, we arrive at similar results with the approximation factor by Ackermann 
and Blomer [l| or the extension of [f| in [3(J [35| . 

3.5.2 Hilbertian metrics 

A special example of metrics are Hilbertian metrics 0, [33| that arise from conditionally positive 
definite (CPD) kernels. A real valued function C : 5 x 5 n R is called a conditionally positive 
definite (CPD) kernel on S if for any positive integer n, any choice of n elements xi G S, [z]™ ([z]™ = 
i = 1, . . . , n) and any choice of n reals it; 6 I such that J2i u i = 0, we have Ynj=o u i u jC{xi, xj) > 
[ll], The following remarkable result [32| connects CPD kernels and Hilbertian metrics, i.e., 
metrics which can be isometrically embedded in Hilbcrt space: There exists a Hilbcrt space H of 
real-valued functions on S, and a mapping $ : S i— > TL such that 

||$(aj) - $(y)|| 2 = -C(x, y) + ±(C(x, x) + C(y, y)) = d c (x, y) , 

if and only if C(-, •) is a CPD kernel. Hence, given a CPD kernel C, one can construct a Hilber- 
tian metric dc{x,y) which behaves like the squared Euclidean distance in the Hilbert space. The 
corresponding kernel is K(x, y) = ^(C(x, y) — C(x, a) — C(y, a) + C(a, a)) for some fixed aeS. 
Here, we choose 5C1 and define the distance of tensors X, Y as 

dc(X,Y)= ^2 d c(xi 1 ,...,i m ,Xi 1 ,...,i m )- 

il ,. . .,i m 

Since the argument by Q for their kmeans++ is independent of the dimensionality, it can be 
generalized from Euclidean distance to distances in a Hilbert space. 
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Lemma 3.5 (ID Hilbertian Metric Clustering). For any ID clustering with a Hilbertian metric dc , 
one can construct a kmeans++ based initialization followed by iterative updates using kernel k-means 
such that if C is the final clustering, then 

E[J{C)\ < 8(logK + 2)J OPT . (3.14) 

Proof. Using dc(x,y) — ||$(x) — $(y)|| 2 , we can use the initialization by Q in the Hilbert space 
on the mapped data points $(2;), since it only depends on squared Euclidean distances or inner 
products, independent of the dimensionality of the space. Finally, the objective function can always 
be improved by running kernel kmeans starting from the kmeansH — h initialization. □ 

Together with Theorem l3.1[ Lemma 1331 directly leads to a tensor clustering guarantee for Hilber- 
tian metrics: 

E[J(C)] < 8m(log/C + 2)J OP t(to), (3.15) 
where K* = maxi<j< m kj is the maximum number of clusters across all dimensions. 

3.5.3 2D factor for binary l\ clustering 



Applying the results of 3l| for binary matrices as ct2 yields the slightly stronger bound for l\ tensor 
clustering: 

J(C) < 3i»fc(«)-i(i + V2)ai Jop T (m). 



Table 1: Approximation guarantees for Tensor Clustering Algorithms. K* denotes the maximum 
number of clusters, i.e., K* = argmax^ kj] c is some constant. 



Problem Name 


Approx. Bound 


Proof 


Metric tensor clustering 
Bregman tensor clustering 
Bregman tensor clustering 
Bregman co-clustering 
Hilbertian metrics 


J(C) < m{l + e)Jop T (m) 

E[JiC)] < 8mci\ogK* + 2)J OPT (m) 

J(C) < mauo-Z (1 + £ )^opt(to) 
Above two results with m = 2 
E[J(C)] < 8m(log K* + 2)J OP tM 


Thm.|3JJ+ [2j 

||3.11|), Thm.[0+ [30, 35] (using [6]) 
(3.11), Thm.[0+ [1] 

as above 

Thm. |3~T1 + Lemma 1331 



4 Experiments 

Our bounds depend strongly on the approximation factor at of an underlying ^-dimensional clustering 
method. In our experiments, we study this close dependence for t = 1, wherein we compare the 
tensor clusterings arising from different ID methods of varying sophistication. Keep in mind that 
the comparison of the ID methods is to see their impact on the tensor clustering built on top of 
them. 

Our experiments reveal that the empirical approximation factors are usually smaller than the 
theoretical bounds, and these factors depend on statistical properties of the data. We also observe 
the linear dependence of the CoTeC objectives on the associated ID objectives, as suggested by 
Thm. 1331 (for Euclidean) and Table [T] (2nd row, for KL-Divcrgence). 

Further comparisons show that in practice, CoTeC is competitive with a greedy heuristic SiTcC 
(Simultaneous Tensor Clustering), which simultaneously takes all dimensions into account, but 
lacks theoretical guarantees. As expected, initializing SiTcC with CoTeC yields lower final objective 
values using fewer "simultaneous" iterations. 

Regarding divergences, we focus on Euclidean distance and KL-divcrgence to test CoTeC. To 
study the effect of the ID method, we use two seeding methods for each divergence, uniform and 
distance-based drawing. The latter seeding ensures ID approximation factors for i£[J(C)] by Q for 
Euclidean clustering and by 3(], 35 1 for KL-divergence. 



We use each seeding by itself and as an initialization for k-means to get four ID methods for each 
divergence. We refer to the CoTeC combination of the corresponding independent ID clusterings by 
abbreviations: 
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r: Randomly (uniformly) sample centers from the data points; assign each point to its closest center, 
s: Sample centers using distance-specific seeding 0, [3(|[35j]; assign each point to its closest center, 
rk: Initialize Euclidean or Brcgman k- means with 'r'. 
sk: Initialize Euclidean or Bregman k-means with 's'. 

The SiTeC method we compare to is the minimum sum-squared residue co-clustering of (lrj ] 
for Euclidean distances in 2D, and a generalization of Algorithm 1 of 0] for 3D and Bregman 2D 
clustering. Additionally, we initialize SiTeC with the outcome of each of the four CoTcC variants, 
which yields four versions (of SiTeC), namely, 

We compare the four versions of CoTeC to SiTeC, an algorithm without guarantees that considers 
the groupings in all dimensions tog ether. For Euclidean distances in 2D, we use the minimum sum- 
squared residue co-clustering o f[16| as SiTeC, while for Euclidean 3D and Bregman tensor clustering, 
we generalize Algorithm 1 of [9]. Initializing SiTeC with each one of the above schemes results in 
another four variants: 

rc: SiTeC initialized with the results of 'r' 
sc: SiTeC initialized with the results of 's' 
rkc: SiTeC initialized with the results of 'rk' 
skc: SiTeC initialized with the results of 'sk' 

These variants inherit the guarantees of CoTeC, as they monotonically decrease the objective value. 



4.1 Experiments on synthetic data 

For a controlled setting with synthetic data, we generate tensors A of size 75 x 75 x 50 and 75 x 75, for 
which wc randomly choose a 5 x 5 x 5 tensor of means M and cluster indicator matrices Q € {0,l}™ iX5 . 
For clustering with Euclidean distances we add Gaussian noise (from JV(0,a 2 ) with varying a) to 
A, while for KL-Divergences we use the sampling method of Q with varying noise. 

For each noise-level to test, we repeat the ID seeding 20 times on each of five generated tensors 
and average the resulting 100 objective values. To estimate the approximation factor a m on a 
tensor, we divide the achieved objective J(C) by the objective value of the "true" underlying tensor 
clustering. Figure [1] shows the empirical approximation factor a m for Euclidean distance and KL- 
Divcrgencc. Qualitatively, the plots for tensors of order 2 and 3 do not differ. 

In all settings, the empirical factor remains below the theoretical factor. The reason for decreasing 
approximation factors with higher noise could be lower accuracy of the estimates of J(C) on the 
one hand, and more similar objective values for all clusterings on the other hand. With low noise, 
distance-specific seeding s yields better results than uniform seeding r, and adding k-means on top 
(rk,sk) improves the results of both. With Euclidean distances, CoTeC with well-initialized ID 
fc-means (sk) competes with SiTeC. For KL-divergence, though, SiTeC still improves on sk, and 
with high noise levels, ID fc-means does not help: both rk and sk are as good as their seeding only 
counterparts. 

In summary, the empirical approximation factor does depend on the data, but in general seems 
to be lower than the theoretical worst-case value. 



4.2 Experiments on real data 

We further assess the behavior of CoTeC on a number of real-world gene expression data sets^|. 

The first three of our data sets, Bcell (1332 x 62), AllAml (2088 x 72) and Breast (21906 x 77) are 
gene expression microarray data sets, and described in detail in [25| . Bcell is a lymphoma microarray 
dataset of chronic lymphotic leukemia, diffuse large Bcell leukemia and follicular lymphoma. During 
preprocessing only those genes were selected whose minimum expression level was above e -1000 . 
Microarray data for B-ccll and T-cell acute lymphocytic leukemia and acute myelogenous leukemia 
is collected in AllAml. Our data matrix is restricted to those genes whose ratio of maximum to 

5 We thank Hyuk Cho for kindly providing us the preprocessed data. 
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Jinnll^Jmnlim^nnUm^rntan^ 






0.4 0.2 



Figure 1: Approximation factors for 3D clustering (left) and co-clustering (right) with increasing noise. 
Top row: Euclidean distances, bottom row: KL Divergence. The x axis shows a, the y axis the empirical 
approximation factor. 



13 



5 CONCLUSIONS 



minimum expression exceeds 10 and for whom the difference between maximum and minimum 
expression was at least 1000. Breast refers to breast cancer data. The gene selection was the same 
as for Bcell. 

The remaining two data sets are cancer microarray matrices from [l5j]. Lewfcemia (3571 x 72 
[2(i| is data from acute lymphoblastic leukemia or acute myeloid leukemia, and Mil (2474 x 72) 
includes data from three types of leukemia (ALL. AML, MLL). 

Even though the data sets have labeled column clusters, we do not compare clustering results with 
the true labels, as the algorithm and its guarantees hold merely for the clustering objective function, 
which may not exactly agree with the true labels. Moreover, we aim for a co-clustering result and 
not single-dimensional clusterings, and the labels are available for only one of the dimensions. 

For each data set, we repeat the sampling of centers 30 times and average the resulting objective 
values. Tables [2] to [4] show detailed reults. Panel (i) displays the objective value for the simplest 
CoTeC, r, as a baseline, and the relative improvement achieved by the other methods. The methods 
are encoded as x, xk, xc, xkc, where x stands for r or s, depending on the row in the said table. 

Overall, the improvements obtained via the approximation algorithm do depend on the dataset 
under consideration and the number of clusters sought. In general, the improvements are lower for 
the bispherically normalized data (e.g., that of 15j) than for the other data sets. 

For both distances, using ID k-means on top of the seeding generally improves on the combined 
co-clustering. The combination method seems particularly competitive for Euclidean distances. On 
the Bcell data (Tabled, the s variant of CoTeC (without k-means) can be as good as SiTeC r 
initialization. The distance-specific seeding (s) gains compared to uniform seeding as the clusters 
become smaller. For Bcell and Breast (Tabled]), the combination of ID k-means clusterings (rk and 
sk) slightly outperforms the SiTeC variants rc and sc). 

Turning to KL Divergences, the impact of the ID method varies with the data, as for Euclidean 
distance. Both ID k-means and better seeding mostly improve the overall outcome. We observe the 
highest improvements on the AllAml data set. With KL Divergences, SiTeC is almost always at 
least a bit better than CoTeC. 

Besides improving the final result, a good initialization aids SiTeC in yet another way: the 
average number of iterations it takes to converge decreases, at times to even less than half the 
reference value. 

Overall, the experiments demonstrate that the combination of good single-dimensional clusterings 
can already lead to reasonable co-clusterings in practice, which can at times be as good as the result 
of a simultaneous biclustering method. Used as an initialization, the CoTeC results improve the 
outcome of SiTeC and reduce the number of "simultaneous" iterations. 



5 Conclusions 

In this paper we presented a simple, and to our knowledge the first approximation algorithm for 
Bregman and metric tensor clustering. Our approximation factor grows linearly with the order m 
of the tensor for Bregman divergences, and is slightly super linear in m for arbitrary metrics. It is 
always linear in the quality of the sub-clusterings. 

Our experiments demonstrated the dependence of the multi-dimensional clustering on the single- 
dimensional clusterings, confirming the dependence stated in the theoretical bound. On real- world 
data, the approximation algorithm is also suitable as an initialization for a simultaneous co-clustering 
algorithm, and endows the latter with its approximation guarantees. In fact the approximation 
algorithm by itself can also yield reasonable results in practice. 

In our experiments we used single-dimensional clusterings with guarantees for our overall approx- 
imation algorithm. An interesting direction for future work is the development of a simultaneous 
approximation algorithm, such as a specific co-clustering seeding scheme of multi-dimensional cen- 
ters, which can be then used as a subroutine by our tensor clustering algorithm. 
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Table 2: (i) Improvement of CoTeC and SiTeC variants upon 'r' in %; the respective reference value (J2 
for 'r') is shaded in gray, (ii) Average number of SiTeC iterations. 



(1) 









Bcell, 


Euc. 












Bcell, 


KL 






fci 


hi 




X 


xk 


xc 


xkc 


(i) k! 


fc 2 




X 


xk 


xc 


xkc 


5 


3 


r 


6.00 ■ 10 5 


20.98 


18.37 


26.44 


5 


3 


r 


3.73 ■ 10" 1 


15.01 


20.87 


21.13 






s 


8.52 


24.97 


22.83 


29.53 






s 


1.53 


14.31 


20.43 


20.26 


5 


6 


r 


5.94 ■ 10 5 


30.68 


26.09 


34.72 


5 


6 


r 


3.60 ■ HT 1 


15.76 


21.23 


21.62 






s 


16.97 


33.35 


32.06 


37.33 






s 


3.24 


16.22 


21.37 


21.21 


20 


3 


r 


5.75 ■ 10 s 


31.66 


20.05 


33.05 


20 


3 


r 


3.37 • 10" 1 


17.59 


22.23 


23.26 






s 


18.83 


32.24 


24.61 


33.36 






s 


10.54 


18.44 


22.99 


22.98 


20 


6 


r 


5.56 ■ 10 5 


49.13 


35.26 


50.37 


20 


6 


r 


3.15 • 10" 1 


18.62 


24.51 


25.43 






s 


34.97 


50.55 


43.93 


51.66 






s 


11.76 


20.52 


25.69 


26.23 


50 


3 


r 


5.63 ■ 10 5 


31.10 


14.77 


31.76 


50 


3 


r 


3.20 ■ HT 1 


15.70 


20.12 


21.07 






s 


15.25 


32.58 


19.14 


33.17 






s 


9.61 


17.24 


20.85 


21.33 


50 


6 


r 


5.18 ■ 10 5 


47.55 


34.63 


48.41 


50 


6 


r 


2.85 • 10" 1 


16.38 


21.61 


22.57 






s 


36.22 


49.83 


43.77 


50.55 






s 


11.86 


18.63 


23.24 


23.13 



(ii) fci 


A-2 


rc 


rkc 


sc 


skc 


(ii) ki 


k 2 


rc 


rkc 


sc 


skc 


5 


3 


11.9 ±3.3 


3.3 ±0.7 


6.1 ±2.8 


3.5 ±0.7 


5 


3 


10.1 ±3.0 


7.2 ± 3.0 


11.1 ±4.3 


7.2 ±3.5 


5 


6 


11.9 ± 2.6 


3.7 ± 1.7 


6.6 ±2.4 


3.3 ± 1.3 


5 


6 


10.8 ±3.1 


8.1 ±3.4 


8.7 ±2.9 


6.8 ±3.3 


20 


3 


7.0 ± 1.4 


2.0 ± 0.2 


3.9 ±1.0 


2.2 ±0.5 


20 


3 


10.6 ±2.8 


7.5 ±2.0 


7.4 ± 1.8 


7.0 ±2.2 


20 


6 


11.3 ± 2.3 


2.6 ±0.8 


5.1 ±2.0 


2.7 ±0.7 


20 


(i 


12.6 ±3.4 


8.8 ±2.9 


8.4 ±2.1 


8.1 ±2.0 


50 


3 


6.2 ± 1.9 


2.0 ±0.0 


3.5 ±2.0 


2.0 ±0.0 


50 


3 


9.1 ±2.3 


6.2 ± 1.3 


6.9 ±1.8 


6.0 ± 1.3 


50 


6 


8.1 ± 2.1 


2.1 ± 0.3 


4.1 ± 1.6 


2.0 ±0.0 


50 


(i 


10.5 ± 1.8 


7.7 ±2.1 


8.1 ± 2.3 


6.9 ± 1.0 



Breast, Euc 



Breast, KL 



(1) 



fci 


fc 2 




X 


xk 


xc 


xkc 


(i) fci 


fc 2 






X 


xk 


xc 


xkc 


5 


2 


r 


1.43 ■ 10 5 


22.96 


20.48 


24.47 


5 


2 


r 


2.70 


■ 10 -2 


8.08 


12.81 


12.23 






s 


2.69 


21.92 


19.42 


24.32 






s 




1.77 


7.98 


13.19 


12.38 


5 


4 


r 


1.42 ■ 10 5 


26.49 


25.85 


27.30 


5 


4 


r 


2.67 


■ 10" 2 


11.88 


17.56 


17.31 






s 


10.38 


26.72 


26.67 


27.95 






s 




3.60 


11.95 


18.10 


18.29 


10 


2 


r 


1.41 ■ 10 s 


22.13 


15.46 


25.26 


10 


2 


r 


2.66 


■10" 2 


8.01 


11.44 


12.37 






s 


7.77 


21.66 


19.20 


25.09 






s 




2.45 


7.96 


12.34 


12.46 


10 


4 


r 


1.37 ■ 10 5 


26.36 


24.09 


28.93 


10 


4 


r 


2.59 


■ 10" 2 


11.17 


16.54 


17.92 






s 


9.79 


26.87 


26.44 


29.90 






s 




4.97 


13.53 


19.50 


19.31 


20 


2 


r 


1.41 ■ 10 5 


22.46 


10.42 


26.21 


20 


2 


r 


2.63 


■ HT 2 


6.27 


9.72 


9.95 






s 


8.16 


22.54 


19.43 


26.16 






s 




2.93 


8.78 


11.69 


11.61 


20 


4 


r 


1.37 ■ 10 s 


27.95 


23.44 


31.71 


20 


4 


r 


2.56 


■10" 2 


11.73 


17.42 


17.78 






s 


10.55 


28.31 


25.83 


32.44 






s 




3.45 


12.21 


17.51 


17.45 



(ii) fci 


k 2 


rc 


rkc 


sc 


skc 


(ii) fci 




rc 


rkc 


sc 


skc 


5 


2 


4.6 ± 2.4 


1.2 ±0.4 


4.0 ± 1.6 


1.8 ±0.4 


5 


2 


5.2 ± 2.0 


3.6 ±2.0 


4.9 ± 2.6 


3.1 ±1.8 


5 


4 


4.9 ± 1.8 


1.0 ± 0.2 


3.0 ±0.9 


1.2 ±0.5 


5 


4 


5.6 ±1.8 


3.6 ± 1.9 


4.4 ± 1.2 


3.5 ± 1.4 


10 


2 


3.4 ± 1.4 


2.0 ± 0.2 


2.6 ± 1.0 


2.0 ±0.0 


10 


2 


4.0 ± 1.8 


2.5 ± 1.0 


4.4 ±2.8 


2.7 ± 1.7 


10 


4 


4.3 ±1.8 


2.0 ±0.5 


3.0 ±0.9 


2.1 ±0.3 


10 


4 


5.1 ± 1.4 


4.0 ± 1.7 


5.2 ± 1.7 


3.7 ± 1.3 


20 


2 


2.9 ± 1.3 


2.0 ±0.0 


2.7 ± 1.0 


2.0 ±0.0 


20 


2 


3.6 ±1.8 


2.3 ±0.9 


3.2 ± 1.5 


2.1 ±0.5 


20 


4 


3.9 ±1.3 


2.1 ± 0.3 


3.4 ± 1.8 


2.0 ±0.2 


20 


4 


5.2 ± 1.9 


3.5 ± 1.8 


4.3 ± 1.6 


2.8 ± 1.2 



15 



5 CONCLUSIONS 



Table 3: (i) Improvement of CoTeC and SiTeC variants upon 'r' in %; the respective reference value (J 2 
for 'r') is shaded in gray, (ii) Average number of SiTeC iterations. 
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7.8 ±4.1 


10 


3 


15.9 ±4.6 3.4 ±1.3 4.8 ±1.2 


2.9 ± 1.0 


10 


3 


18.3 ±3.4 7.2 ±2.6 


12.1 ± 3.5 


9.3 ±4.6 


20 


3 


12.3 ±3.4 2.9 ±1.3 4.7 ±1.3 


2.9 ±0.9 


20 


3 


18.9 ±2.5 12.0 ±4.5 


11.1 ± 3.1 


10.3 ± 2.9 



Leukemia, Euc. Leukemia, KL 



(i) 


fci 


k 2 


X 


xk 


xc 


xkc 


(i) 


fci 


k 2 




X 


xk 


xc 


xkc 




3 


2 r 


7.61 


■ 10 4 


5.48 


5.77 


6.74 




3 


2 r 


1.82 


■ ltr 1 


5.11 


7.15 


7.52 






s 




0.17 


5.54 


5.73 


6.78 






s 




0.36 


4.93 


7.19 


7.51 




3 


3 r 


7.57 


■ 10 4 


6.53 


7.18 


7.75 




3 


3 r 


1.81 


■ 10- 1 


6.00 


8.13 


8.76 






s 




0.14 


6.79 


6.77 


7.79 






s 




0.44 


6.08 


8.18 


8.76 




50 


2 r 


7.30 


■ 10 4 


3.79 


5.97 


7.25 




50 


2 r 


1.71 


■ 10- 1 


3.81 


7.58 


7.60 






s 




0.33 


3.75 


5.54 


7.25 






s 




-0.21 


3.65 


7.32 


7.35 




50 


3 r 


7.15 


■ 10 4 


4.90 


7.34 


8.93 




50 


3 r 


1.68 


■ 10- 1 


4.74 


9.31 


9.35 






s 




0.60 


5.00 


8.00 


9.06 






s 




1.08 


5.16 


9.70 


9.75 




75 


2 r 


7.26 


10 04 


3.66 


5.67 


6.89 




75 


2 r 


1.71 


■ 10- 1 


3.36 


6.92 


6.95 






s 




0.02 


3.67 


5.23 


6.88 






s 




-0.35 


2.85 


6.60 


6.30 




75 


3 r 


7.09 


■ 10 4 


4.59 


7.09 


8.47 




75 


3 r 


1.66 


■ 10- 1 


4.48 


9.04 


9.11 






s 




0.60 


4.61 


7.05 


8.52 






s 




0.69 


4.25 


8.66 


8.68 


ii) fci 


k 2 




rc 


rkc 


§c 


skc 


(ii) fci 


k 2 




rc 


rkc 


sc 


skc 


3 


2 


3.8 ±1.3 


2.0 ±0.0 3.3 ±0.8 


2.0 ±0.0 


3 


2 


7.6 ±3.5 


4.5 ±3.2 8.0 ±2.9 


4.6 ±3.2 


3 


3 


4.5 ± 1.5 


2.2 ±0.4 3.8 ±1.1 


2.1 ±0.3 


3 


3 


7.4 ±2.5 


5.1 ±1.7 7.3 ±3.0 


4.7 ± 1.4 


50 


2 


3.3 ±1.1 


2.0 ±0.0 2. 


3 ±1.3 


2.0 ±0.0 


50 


2 


5.4 ± 1.8 


3.4 ±0.7 5.7 ±2.5 


3.3 ±0.5 


50 


3 


3.3 ±0.8 


2.0 ±0.0 3.7 ±1.1 


2.0 ±0.0 


50 


3 


6.2 ± 2.0 


4.5 ±0.8 5.5 ±1.0 


4.6 ± 1.1 


75 


2 


3.1 ±0.9 


2.0 ±0.0 3.3 ±1.1 


2.0 ±0.0 


75 


2 


5.3 ±1.8 


3.4 ±1.2 5.6 ±2.2 


3.2 ±0.5 


75 


3 


3.6 ±0.9 


2.0 ±0.0 3.4 ±1.0 


2.0 ±0.0 


75 


3 


5.6 ± 1.4 


4.2 ±0.6 4.9 ±1.1 


4.1 ±0.3 



Tabic 4: (i) Improvement of CoTeC and SiTeC variants upon 'r' in %; the respective reference value (J2 
for 'r') is shaded in gray, (ii) Average number of SiTeC iterations. 

Mil, Euc. 

(i) fci k 2 x xk xc xkc 

3 3 r 6.52 • 10 4 1054 1L26 11.45 (ii) fci k 2 I rc rkc sc skc 

s 1.41 10.62 11.20 11.46 3 3 4.2 ± 1.2 2.0 ± 0.3 3.8 ± 1.0 2.0 ± 0.5 

50 3 r 5.83 ■ 10 4 815 12l33 1 3.2 1 50 3 4.7 ±1.9 2.0 ± 0.0 4.2 ±1.5 2.1 ±0.3 

s 1.12 8.23 12.35 13.17 75 3 4.4 ±1.4 2.0 ± 0.0 4.3 ±1.5 2.0 ± 0.0 

75 3 r 5.75 • 10" TM 1L69 12.52 

s 0.84 7.86 11.68 12.52 



l(i 
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